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Abstract 

The binocular perception of shape and of depth relations between objects can 
change considerably if the viewing direction is changed only by a small angle. We 
explored this effect psychophysical^ and found a strong depth reduction effect 
for large disparity gradients. The effect is found to be strongest for horizontally 
oriented stimuli, and stronger for line stimuli than for points. This depth scaling 
effect is discussed in a computational framework of stereo based on a Baysian 
approach which allows to integrate information from different types of matching 
primitives weighted according to their robustness. 
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1 Introduction 

Stereoscopic vision enables us to compute depth by evaluating two different views of the 
same scene. Textbooks try to tell us that the computation of the depth of objects is 
a simple geometric operation once corresponding features in the two retinae have been 
found (i.e., the correspondence problem is solved). We know, however, that absolute 
depth computation is actually not quite so easy. The difference in the angular separation 
between two points on the left and right retina (their disparity) determines depth cor- 
rectly only up to a scale factor. The two retinal images of an object become more si mi lar 
(less disparate) with increasing viewing distance and equal, of course, for an object in- 
finitely far away. Hence, a given disparity corresponds to a much smaller distance from 
the horopter for nearby objects than for others far away. In order to compute absolute 
depth the disparities have to be scaled according to distance. 

Despite the fact that the disparity decreases as the square of the viewing distance the 
perceived size and depth relations between objects do not change. This is an indication 
that the appropriate depth scaling is used by the seeing brain. It is usually assumed 
that this scaling is achieved by using the vergence angle of the eyes, or by using vertical 
disparities (Mayhew & Longuett-Higgins, 1982). In a recent paper Rogers and Cagenello 
(1989) propose a mechanism for recovering shape information without the need for scaling 
by using the second spatial derivative of disparity, a measure which remains invariant 
with viewing distance. 

In Figure 1 we can observe another instance of the fact that perceived depth is not 
determined by binocular disparity alone. While the object in Figure la looks quite flat, 
different views of the same object lead to a much more accurate perception of three 
dimensional shape (Fig. lb-d). There are several possible explanations of the depth 
scaling effect in Figure 1. The first explanation is concerned with the correspondence 
problem. We have difficulties in matching the correct edges of the cube for a view like 
in Figure la. Matching ambiguity can be reduced for solid opaque objects (Fig. lb) 
because the removal of hidden backface lines reduces the matching problem to a two-to- 
one-match like in Panum's limiting case. Another way to reduce the ambiguity is to tilt 
the cube around the vertical axis (Fig. lc). Matching ambiguity can be further reduced 
by avoiding Panum's limiting case in Figure lc through a larger rotation around the 
vertical axis (Fig. Id). 

In order to analyze in a less qualitative way the effects which lead to reduction in 
perceived depth such as in Figure 1 we designed a psychophysical experiment in which 
the neighborhood relations between matching primitives could be varied in an orderly 
fashion. 

In a parametric study, we investigated the characteristics of this depth scaling effect 
with simple stimuli (dots, lines, symbols) and explored its dependence on the magnitude 
and orientation of the disparity gradient which is a good measure to specify neighborhood 
relations. 
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Figure 1: Four stereoscopic views of the same 3D-object from slightly different viewpoints 
can give rise to considerably different shape perceptions in spite of identical binocular 
disparities. A stereopair of the object was generated by orthographic projection of a 
cube after rotation around the vertical axis by ±3 deg (a). Matching ambiguity in (a) 
can be avoided by hidden line removal (b). In (c) a "cube version" of Panum's limiting 
case is shown. Matching ambiguity can be further reduced by avoiding Panum's limiting 
case with different rotations around the vertical axis (Fig. 1(d): left cube: -4 deg; right 
cube: —10 deg). 
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Figure 2: A circle is presented to the left of the fixation point (F.P.) and in front of the 
horopter. Projections to the two eyes are symbolized by circles with different orientations 
of hatching, (//////) for the left eye, (\ \ \ \ \\) for the right eye. To the right of the 
fixation point, a square is presented behind the horopter. It appears on the left retina 
(//////) a * a smaller distance from the fixation point than on the right retina (\\ \\ \\). 
The arrows give the size of the disparity between the projections to both eyes (di, cIr), and 
the size of the mean binocular distance between the two symbols (5"w n ) (di, = disparity of 
the left stimulus (circle), djj = disparity of the right stimulus (square), 5« n = binocular 
separation between stimuli). The disparity gradient G can be defined by the ratio of the 
disparity difference (di — dji) and the binocular separation S^ n (see Burt and Julesz, 
1980). (a) If one stimulus lies in front of the horopter while the other one is behind the 
horopter, the arrows point in opposite directions; if both symbols are on the same side of 
the horopter the arrows point in the same direction. In (b), the right object is located 
on the horopter plane and its disparity is therefore zero. The disparity gradient depends 
only upon the disparity dz of the left object, and upon the binocular separation Sinn- (c) 
shows a bird's eye view of the stimulus configuration of (a). 



2 Material and Methods 

In order to have precise and flexible control over the neighborhood relations of our stimuli 
(dots, lines or symbols) we displayed them on a digital CRT-monitor (Hewlett Packard 
1347 A) with short persistence phosphor (P31) and a 2048x1513 addressable resolution. 
The CRT-monitor is a vector display unit with internal refresh memory which allowed us 
to display vectors with a very high frame rate (1000 Hz). The monitor was programmed 
in HP-GL and controlled via a RS-232 interface from a LSI 11/23 (Digital™). Dot size 
and line width were 2.3 arc min at a viewing distance of 60 cm and their luminance was 
about 10cd/m 2 as measured over a filled 1 deg area with a spot-photometer (Minolta™). 
Background illumination by indirect overhead incandescent lighting was about 10cd/m 2 . 
The stimulus consisted of two points in both eyes, separated either horizontally, vertically 
or obliquely with disparity differences between the two points between and 54 arc min. 
In the latter case, one stimulus showed 27 arc min of crossed disparity (in front of the 
screen, cf. Fig. 2c) while the other stimulus had 27 arc min of uncrossed disparity (behind 
the screen). The depth gradient G between two stimuli can be defined as the difference 
in disparity between the two stimuli (<fo, dt) divided by the binocular separation G = 
(d R — d L )/Sbi n (Fig. 2; as defined by Burt & Julesz, 1980). H one of the two stimuli 
is fused, one of the terms dn or di becomes zero, and the gradient G is defined as 
the disparity of the non-fused stimulus divided by the mean distance between the two 
stimuli (Fig. 2b). For zero disparity gradient both points lie in the same depth plane. 
Panum's limiting case corresponds to a disparity gradient of 2.0 (cf. G = 2 in Fig. 3). 
If Panum's limiting case is exceeded, i.e, the disparity gradient is larger than 2.0, the 
ordering constraint is violated and the stimuli are in in the so-called "forbidden zone" (cf. 
G = 3 in Fig. 3, Burt L Julesz, 1980; Krol & van der Grind, 1980). In our experiments 
the disparity gradient was varied between 0.3 and 1.9 in 0.2 steps. 

A bright rectangular frame sized 4.5x5.3 deg for both eyes served as a fusion pattern. 
This pattern was constantly displayed between stimulus presentations. Stimuli were pre- 
sented immediately after the observer's response to the preceding stimulus, either for 
100 msec to prevent eye movements (in a pilot study) or for a duration of 7 sec since 
100 msec were too short for most observers to evaluate depth under these conditions. All 
results shown are for this longer presentation time. In additional experiments, the dots 
were connected by lines or replaced by different symbols. The size of the symbols was 
either small (5 arc min) or large (24 arc min). Altogether, each subject saw four kinds 
of stimuli in three different orientations (horizontal, oblique and vertical) and with eight 
different disparities and nine different gradients. As every stimulus was presented at least 
twice, each observer went through at least 2000 presentations including the test trials. 
Head motion was restricted by a headrest. A black vertical screen, running from the cen- 
ter of the monitor towards the subject's nose separated the visual fields of both eyes. The 
stimuli for both eyes were always presented simultaneously to the corresponding halves 
of the monitor. High quality spherical lenses with a power of 1.5 diopters in front of both 
eyes prevented accommodation and the convergence associated with accommodation. 
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Figure 3: Viewing geometry, as in Fig. 2 illustrates different disparity gradients ranging 
from 0.5 over 2.0 (Panum's limiting case) to 3.0 (forbidden zone, i.e., violation of ordering 
constraint). 
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Figure 4: Illustration of how a stimulus-pattern with large symbols appears together with 
the reference lines on the CRT-monitor. The experimental equivalent can be experienced 
by uncrossed fusion of the left and right stereo images. The square symbol should ap- 
pear behind the zero disparity plane at about the height of the third line from the top 
(15 arc min). 

The subjects estimated the perceived depth of the two stimuli by indicating where the 
points seemed to be relative to a depth scale that was displayed simultaneously (Fig. 4). 
This scale was produced by 10 lines of length 3 deg, with distances of 22 arc min between 
lines and differences in disparity of 5 arc min. This results in reference lines located at 5, 
10, 15, 20, and 25 arc min in front of and behind the plane of the fusion pattern (frame 
at zero disparity). The depth gradient between these reference lines thus was 0.25. After 
each presentation, the subject had to decide which of the reference lines was closest to the 
near or far stimulus. In a single experiment each combination of disparity and disparity 
gradient was tested in a pseudo-random order at least twice for each observer. In any 
one session, observers typically ran six experiments in about 2 hours. Three out of the 
ten subjects were experienced observers that had previously participated in a number 
of stereoscopic experiments. The remainder were students and staff of the Tubingen 
University that were volunteering to participate in the experiments. They were naive 
as to the purpose of the investigation. There was no significant difference between the 
results of the naive subjects and those of the authors. Each of the observers first had to 
run a null series to adapt to the task before the experiments proper. The length of this 
null- series varied considerably depending on experience and other subjective factors of 
the observers. All observers had normal or corrected to normal vision. 
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Figure 5: Perceived depth in percent of displayed disparity of two points as a function of 
disparity and disparity gradient. The perceived depth at 27 arc min disparity is reduced 
to about 50% for large disparity gradients. Means (a) and standard deviations (b) of ten 
observers. 

3 Results 



In a first set of experiments, the subjective depth difference for two dots was determined. 
The stimuli were tested with disparity differences at each separation chosen to produce 
disparity gradients between 0.3 and 1.9. We tested the steep gradients above 1, though 
it is known from previous work (Burt & Julesz, 1980) that the dots appear diplopic and 
cannot be fused at gradients above 1. However, even without fusion of the dots and in 
the presence of binocular rivalry, one is still able to see the dots at different depths. A 
good example of this is Panum's limiting case where for one eye the two dots lie directly 
along the visual axis and therefore only one dot is seen by this eye while the two dots 
are seen separated by the other eye (G = 2 in Fig. 3). It seems to be impossible to 
fuse one dot with two other dots in the other eye, but one actually perceives two stimuli 
separated in depth under these conditions. The perceived depth difference between the 
test dots, however, decreases if they approach each other while their disparity difference 
is kept constant. If the disparity of each of the points stays constant during the approach, 
the points stay on the same frontoparallel planes, while the disparity gradient increases. 
Subjectively, however, the dots seem to leave their frontoparallel planes and to approach 
each other not only laterally, but also in depth: they both approach the horopter but 
from different sides. 

Figure 5a shows the perceived disparity of two points (horizontal orientation) as the 
percentage of the displayed disparity for different disparities and disparity gradients. This 
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Figure 6: Perceived depth in percent of displayed depth as a function of depth gradient 
for horizontally oriented stimuli consisting of points, lines and small or large symbols. 
Means of ten subjects and nine different disparities (3-27 arc min). The standard error 
of the means is in the order of the symbol size. 

3D-bar-graph demonstrates that the subjective depth depends not only on the disparity 
but also on the disparity gradient between the points. The decrease in subjective depth 
with increasing disparity gradient is most evident for larger disparities. For a disparity 
of 27 arc min, which is outside of the static Panum's fusional area for bright bars (Schor 
& Tyler, 1981; Schor & Wood, 1983), but still inside Panum's area for certain stimuli (cf. 
Fender & Julesz, 1967; Richards k Kaye, 1974; Kulikowski, 1978; Schor, Wood & Ogawa, 
1984), the perceived depth decreases to about 50% when the disparity gradient reaches 
1.9, that is near Panum's limiting case. The decrease of perceived depth with increasing 
depth gradient is more pronounced with line stimuli (less than 40%) than with point 
stimuli or symbols (Fig. 6). It is also clearly more pronounced for horizontal orientation 
of the stimuli (Odeg) than for vertical (90deg) or oblique (45deg) orientation (Fig. 7). 

For all types of stimuli, both an increase of the depth gradient and of the disparity 
lead to a decrease in the proportion of displayed disparity to perceived depth (cf. Fig. 5 
for the results of point stimuli). For a disparity gradient of 1.9 at a disparity of 27 arc min, 
the perceived depth difference for horizontal line stimuli amounts to just below 40% of 
the depth difference to be expected on the basis of the disparities of the stimuli (Fig. 6). 
If instead of two points two different and easily discriminable symbols are presented, 
the decrease in perceived depth is less dramatic than with point stimuli, at least with a 
horizontal orientation of the stimuli (Fig. 6). The results for small sized symbols resemble 
those for point stimuli (Fig. 6). The decrease in subjective depth difference is clearly 
more pronounced at all gradients for horizontal lines than for horizontally arranged small 
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decrease is insignificant if easily discriminable (large) symbols are shown instead of 
points (Fig. 6). 

• The decrease in perceived depth for all kinds of stimuli is clearly more pronounced 
for horizontal orientation than for oblique or vertical orientation (Fig. 7). 

The subjective decrease in perceived depth — or perceived depth difference — might 
be attributed to a tendency of the brain to make more conservative depth estimates when 
the relative position of a stimulus cannot be determined by the visual system with the 
accuracy necessary for an exact localization in depth (Yuille, Bulthoff & Fahle, 1987). 
Given a spatial uncertainty of a denned spatial extent of the localization mechanism, this 
spatial uncertainty corresponds to a larger range of depth with steep gradients than for 
shallow gradients. Under these conditions, the visual system chooses an interpretation 
that corresponds — given a denned uncertainty in the determination of relative positions 
of the stimuli — with the smallest overall disparity difference, as is the case in the 
double-nail illusion (cf. also Krol & van de Grind, 1980). 

A depth reduction effect has been reported also for the endpoints of lines which seem 
to lie on a much flatter depth gradient than two isolated points. This effect was shown 
for very shallow gradients near the absolute thresholds for stereo acuity (Werner, 1937; 
McKee, 1983; Mitchison & Westheimer, 1984; Fahle & Westheimer, 1988). 

A decrease in perceived subjective depth at constant disparity similar to the one 
investigated in this study is also described for the case when the two elements of a 
stereoscopic stimulus are presented with an interocular delay to both eyes. Also under 
these conditions, subjective depth of the stimulus approaches gradually the fixation plane 
for interocular delays around 100 msec, depending on stimulus duration (Ogle, 1963; 
Aulhorn, 1971; Herzau, 1976). A possible link between these two observations could be 
the uncertainty in spatial and temporal localization of corresponding features. A decrease 
of the accuracy for the exact spatial localization might be due to lateral, especially 
horizontal interactions between the stimuli when they are close together (especially below 
10 arc min, cf. Westheimer & Hauske, 1975). Such interactions could account for the 
decrease of perceived difference in depth with increasingly steeper gradients in depth. 
The effects of orientation indicate the possible nature of these interactions. Since the 
depth scaling is strongest for a horizontal orientation of the stimuli, the depth scaling 
effect might be based upon difficulties in solving the correspondence problem. As can 
be seen from Figure 3, the possibility of false matching is much more pronounced for 
steep than for shallow gradients, i.e., if the stimuli are closer to each other. For oblique 
orientations, matching ambiguity can be largely reduced by making use of the epipolar 
line constraint (Mayhew & Frisby, 1981; Yuille and Poggio, 1984). In principle, matching 
ambiguity can be completely avoided by using different symbols. As can be seen in 
Fig. 6 the depth reduction effect disappears almost completely for large symbols which 
are easily discriminable. Matching ambiguity is also strongest for horizontally oriented 
lines because in principle each point can be possibly matched with each point on the line 



10 



in the other eye. It is therefore not surprising that the depth reduction effect is strongest 
for horizontal lines. 

Note, however, that the false matching argument in the case of two points would be 
of a rather indirect and complicated nature, as false matches would lie in the 'forbidden 
zone' and this would steepen rather than flatten the gradient in the first place. We 
know however from Krol and van de Grind's experiments on the double nail illusion that 
objects in the forbidden zone are more likely to be seen side by side (small gradient) than 
veridical behind each other (steep gradient). 

The depth scaling effect is well in line with a computational framework of stereo based 
on Markov Random Field and a Bayesian approach to vision (Yuille, Geiger & Btilthoff, 
1989). This theory allows to integrate information from different types of matching 
primitives (weighted according to their robustness), or from different vision modules. 
Unlike previous theories of stereo which first solved the correspondence problem and 
then constructed a surface by interpolation (Grimson, 1981) this theory combines the 
two stages. The correspondence problem is solved to give the disparity field which best 
satisfies the a priori constraints. As in other theories a smoothness term is required to 
give a unique matching for solving the correspondence problem, but in this theory its 
importance increases as the matching primitives become more similar. If the features 
are sufficiently different (perhaps pre- attentively discriminable) the matching ambiguity 
does not exist and the correct disparities can be computed. If the features become more 
similar or less separable then a priori assumptions like smoothness must be used to 
obtain a unique match. The greater the similarity between features the more the need 
for smoothness and hence the stronger the bias towards the fronto-parallel plane. The 
matching ambiguity and thereby the depth reduction effect can be reduced by: 

(1) increasing the retinal separation between features (i.e., smaller disparity gradient); 

(2) using additional constraints like the epipolar line constraint for non- horizontal lines; 

(3) using figural or size differences for (pre-attentive) discrimination. These are exactly 
the three points which have been addressed experimentally in this paper and which 
showed a significant influence on perceived depth. 



°A preliminary version of this paper was presented at the 10th ECVP at Bad Nauheim 1986: Per- 
ception of disparity gradients. Perception 15, A 41 (1986). 
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